fix(cluster): strip signing_key from /api/cluster/workers by jaylfc · Pull Request #296 · jaylfc/tinyagentos

jaylfc · 2026-05-01T16:54:59Z

Summary

`asdict(WorkerInfo)` includes the worker's raw HMAC `signing_key: bytes`. FastAPI's encoder utf-8-decodes bytes fields and crashes on non-utf8 key material. Result: `/api/cluster/workers` returns 500 every call, the cluster widget on the dashboard goes blank, the Fedora worker disappears from the UI even though heartbeats are flowing.
Strip `signing_key` from the response. The HMAC secret has no place on the wire even when serialization works.

Repro (pre-fix)

Have any worker registered (Pi controller + Fedora worker)
Open the dashboard or hit `GET /api/cluster/workers` directly
500 `UnicodeDecodeError` on every call

Test plan

Reproduced on Pi
Lint clean
After merge + Pi rebuild, Fedora worker reappears in cluster widget

Summary by CodeRabbit

Release Notes

New Features
- Agents now receive personalized system instructions that adapt based on channel type and team member roles, enabling more context-aware responses.
Bug Fixes
- Removed sensitive credentials from worker API responses to enhance security.
Tests
- Added extensive test coverage for agent instruction generation and message routing context handling.

…ispatch Adds agent_manual.py with build_manual() — a pure function that returns a per-agent operating manual (≤500 tokens) covering @-mention routing, project task verbs, and lead/non-lead status. The router prepends it as role:system before the conversation history on every chat dispatch so agents always know the channel primitives without per-session briefs.

asdict(WorkerInfo) dumps the worker's raw HMAC signing_key bytes into the response. FastAPI's default jsonable_encoder maps bytes via o.decode() with implicit utf-8 — random key material has bytes outside the utf-8 range, so the entire endpoint 500s. Visible symptom: workers disappear from the cluster widget on the dashboard because the list endpoint never succeeds. Beyond the crash, leaking the HMAC signing key over the API has no legitimate purpose. Strip it from the response. Confirmed on Pi: GET /api/cluster/workers was returning 500 with "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2..." every poll. Fedora worker (192.168.6.108) heartbeats are fine — the bug was purely in the list-side serialization.

coderabbitai · 2026-05-01T16:55:12Z

📝 Walkthrough

Walkthrough

This pull request introduces an agent-specific system-prompt injection feature via a new build_manual module that generates conditional operating manual text based on channel type and lead membership. The agent chat router now derives lead lists from channel settings and prepends generated manuals to per-agent contexts. A minor security fix strips signing keys from the worker API response.

Changes

Cohort / File(s)	Summary
Agent Manual System `tinyagentos/agent_manual.py`, `tinyagentos/agent_chat_router.py`	New `build_manual` function generates markdown system prompts with conditional sections (routing guidance, kanban verbs, lead/non-lead instructions) based on channel type and project context. Router integrates by deriving `leads` from settings and injecting the manual as a prepended system message.
Agent Manual Tests `tests/test_agent_manual.py`, `tests/test_agent_chat_router.py`	New test suites verify manual generation for DM/group/project channels, validate lead vs. non-lead branching, enforce token-budget constraints (≤200 words for DM, ≤500 for project), and confirm router correctly injects the manual as the first system message in agent context.
Worker API Security `tinyagentos/routes/cluster.py`	`list_workers` endpoint now strips raw `signing_key` from serialized worker data to prevent serialization failures and credential exposure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat(chat): always-on system context manual injected into every agent dispatch #295: Implements the same agent-manual feature with matching build_manual function and per-agent context injection in router, plus corresponding test coverage for lead/non-lead branching and channel-type behavior.

Poem

🐰 A rabbit hops through channels bright,
Crafting manuals left and right,
Leads and non-leads each get their way,
System prompts guide agents every day! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title 'fix(cluster): strip signing_key from /api/cluster/workers' directly describes the main change in tinyagentos/routes/cluster.py. However, the changeset includes four additional file changes (two new test files for agent_manual and agent_chat_router, and one modification to agent_chat_router.py) that are unrelated to the cluster fix and not reflected in the title.	Either split this into separate PRs (cluster fix vs. agent manual/router tests), or update the title to reflect all major changes (e.g., 'feat: add agent manual system and cluster signing_key fix').
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.79% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/cluster-workers-signing-key-leak

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kilo-code-bot · 2026-05-01T16:56:46Z

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (5 files)

tests/test_agent_chat_router.py - Added tests for system context injection
tests/test_agent_manual.py - Comprehensive tests for agent manual generation
tinyagentos/agent_chat_router.py - Integrated manual injection into routing logic
tinyagentos/agent_manual.py - New module for building operating manuals
tinyagentos/routes/cluster.py - Fixed signing key exposure in API response

_{Reviewed by grok-code-fast-1:optimized:free · 55,589 tokens}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tinyagentos/routes/cluster.py (2)
19-33: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix mutable default values in WorkerRegister (shared state across requests).

WorkerRegister defines mutable defaults:

hardware: dict = {} (Line 22)

backends: list[dict] = [] (Line 23)

models: list[str] = [] (Line 24)

capabilities: list[str] = [] (Line 25)

Even if Pydantic often copies defaults, relying on that is risky: mutable defaults can lead to cross-request data leakage / unexpected mutation when any code mutates these values.

Suggested fix: switch to Field(default_factory=...) for each mutable field.
🛠️ Proposed fix
-from pydantic import BaseModel
+from pydantic import BaseModel, Field

 class WorkerRegister(BaseModel):
     name: str
     url: str
-    hardware: dict = {}
-    backends: list[dict] = []
-    models: list[str] = []
-    capabilities: list[str] = []
+    hardware: dict = Field(default_factory=dict)
+    backends: list[dict] = Field(default_factory=list)
+    models: list[str] = Field(default_factory=list)
+    capabilities: list[str] = Field(default_factory=list)
     platform: str = ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tinyagentos/routes/cluster.py` around lines 19 - 33, WorkerRegister uses
mutable default values for hardware, backends, models, and capabilities which
can cause shared state; update these fields to use Pydantic's Field with
default_factory (e.g., Field(default_factory=dict) for hardware and
Field(default_factory=list) for backends/models/capabilities) and import Field
from pydantic, leaving other fields and kv_* defaults unchanged so each model
instance gets its own fresh collection.
66-89: ⚠️ Potential issue | 🟠 Major

API contract mismatch: list_workers returns bare array, but consumer expects wrapped object.

The endpoint returns a bare JSON array, but tinyagentos/scheduling/resource_manager.py (line 112) calls .get("workers", []) expecting an object shape {"workers": [...]}. When the actual list response is received, .get() fails silently due to exception handling and returns [], causing cluster workers to never be discovered during remote resource checks.

Fix: In resource_manager.py line 112, change resp.json().get("workers", []) to resp.json() to match the actual endpoint response.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tinyagentos/routes/cluster.py` around lines 66 - 89, The remote workers API
(list_workers) returns a bare JSON array but the client is calling
resp.json().get("workers", []), causing discovery to fail; locate the call that
uses resp.json().get("workers", []) (the code that fetches remote cluster
workers) and replace that expression with resp.json() so the client treats the
response as the list itself (ensure the returned value is assigned to the
variable used as the workers list).

🧹 Nitpick comments (1)

tinyagentos/routes/cluster.py (1)
81-87: ⚡ Quick win

Avoid mutating worker state during a GET response (unless intentional + safe).

Inside list_workers, when registry is not None, the code both:

computes tier_id / potential_capabilities for the response (Lines 82-85), and

mutates in-memory worker state to “keep WorkerInfo fields in sync” (Lines 86-87).

If multiple requests come in concurrently (or other code relies on these fields being updated only during heartbeat/registration), this can cause subtle racey/ordering behavior and makes GET have side effects.

If you want to persist these derived fields, consider moving that update to the heartbeat/registration path; otherwise compute only into d without touching w.tier_id / w.potential_capabilities.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tinyagentos/routes/cluster.py` around lines 81 - 87, In list_workers, avoid
mutating in-memory worker state during a GET: remove the assignments to
w.tier_id and w.potential_capabilities so the handler only computes tier_id and
pot_caps via _potential_capabilities(hardware, registry) and writes them into
the response dict d; if you need to persist these derived fields instead, move
the w.tier_id/w.potential_capabilities updates into the worker
heartbeat/registration code path (where WorkerInfo is intentionally updated)
rather than inside list_workers.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tinyagentos/routes/cluster.py`:
- Around line 19-33: WorkerRegister uses mutable default values for hardware,
backends, models, and capabilities which can cause shared state; update these
fields to use Pydantic's Field with default_factory (e.g.,
Field(default_factory=dict) for hardware and Field(default_factory=list) for
backends/models/capabilities) and import Field from pydantic, leaving other
fields and kv_* defaults unchanged so each model instance gets its own fresh
collection.
- Around line 66-89: The remote workers API (list_workers) returns a bare JSON
array but the client is calling resp.json().get("workers", []), causing
discovery to fail; locate the call that uses resp.json().get("workers", []) (the
code that fetches remote cluster workers) and replace that expression with
resp.json() so the client treats the response as the list itself (ensure the
returned value is assigned to the variable used as the workers list).

---

Nitpick comments:
In `@tinyagentos/routes/cluster.py`:
- Around line 81-87: In list_workers, avoid mutating in-memory worker state
during a GET: remove the assignments to w.tier_id and w.potential_capabilities
so the handler only computes tier_id and pot_caps via
_potential_capabilities(hardware, registry) and writes them into the response
dict d; if you need to persist these derived fields instead, move the
w.tier_id/w.potential_capabilities updates into the worker
heartbeat/registration code path (where WorkerInfo is intentionally updated)
rather than inside list_workers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c21cb44a-bd03-4e24-b789-2023963cceb1

📥 Commits

Reviewing files that changed from the base of the PR and between 5aaf8e3 and 9b4154e.

📒 Files selected for processing (5)

tests/test_agent_chat_router.py
tests/test_agent_manual.py
tinyagentos/agent_chat_router.py
tinyagentos/agent_manual.py
tinyagentos/routes/cluster.py

jaylfc added 2 commits May 1, 2026 16:31

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

jaylfc merged commit c33d764 into master May 1, 2026
8 checks passed

jaylfc deleted the fix/cluster-workers-signing-key-leak branch May 1, 2026 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cluster): strip signing_key from /api/cluster/workers#296

fix(cluster): strip signing_key from /api/cluster/workers#296
jaylfc merged 2 commits into
masterfrom
fix/cluster-workers-signing-key-leak

jaylfc commented May 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 1, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (2 warnings)

Uh oh!

kilo-code-bot Bot commented May 1, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jaylfc commented May 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Repro (pre-fix)

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (2 warnings)

Uh oh!

kilo-code-bot Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jaylfc commented May 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 1, 2026 •

edited

Loading

kilo-code-bot Bot commented May 1, 2026 •

edited

Loading